NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DL-TODA: A Deep Learning Tool for Omics Data Analysis

https://doi.org/10.3390/biom13040585

Cres, Cecile M.; Tritt, Andrew; Bouchard, Kristofer E.; Zhang, Ying (April 2023, Biomolecules)

Metagenomics is a technique for genome-wide profiling of microbiomes; this technique generates billions of DNA sequences called reads. Given the multiplication of metagenomic projects, computational tools are necessary to enable the efficient and accurate classification of metagenomic reads without needing to construct a reference database. The program DL-TODA presented here aims to classify metagenomic reads using a deep learning model trained on over 3000 bacterial species. A convolutional neural network architecture originally designed for computer vision was applied for the modeling of species-specific features. Using synthetic testing data simulated with 2454 genomes from 639 species, DL-TODA was shown to classify nearly 75% of the reads with high confidence. The classification accuracy of DL-TODA was over 0.98 at taxonomic ranks above the genus level, making it comparable with Kraken2 and Centrifuge, two state-of-the-art taxonomic classification tools. DL-TODA also achieved an accuracy of 0.97 at the species level, which is higher than 0.93 by Kraken2 and 0.85 by Centrifuge on the same test set. Application of DL-TODA to the human oral and cropland soil metagenomes further demonstrated its use in analyzing microbiomes from diverse environments. Compared to Centrifuge and Kraken2, DL-TODA predicted distinct relative abundance rankings and is less biased toward a single taxon.
more » « less
Full Text Available
FAIR for AI: An interdisciplinary and international community building perspective

https://doi.org/10.1038/s41597-023-02298-6

Huerta, E. A.; Blaiszik, Ben; Brinson, L. Catherine; Bouchard, Kristofer E.; Diaz, Daniel; Doglioni, Caterina; Duarte, Javier M.; Emani, Murali; Foster, Ian; Fox, Geoffrey; et al (December 2023, Scientific Data)

Full Text Available
Scaling of Union of Intersections for Inference of Granger Causal Networks from Observational Data

https://doi.org/10.1109/IPDPS47924.2020.00036

Balasubramanian, Mahesh; Ruiz, Trevor D.; Cook, Brandon; Prabhat, Mr; Bhattacharyya, Sharmodeep; Shrivastava, Aviral; Bouchard, Kristofer E. (May 2020, IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020)
null (Ed.)
Full Text Available
HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards

https://doi.org/10.1109/BigData47090.2019.9005648

Tritt, Andrew J.; Rubel, Oliver; Dichter, Benjamin; Ly, Ryan; Kang, Donghe; Chang, Edward F.; Frank, Loren M.; Bouchard, Kristofer (December 2019, 2019 IEEE International Conference on Big Data (Big Data))

A ubiquitous problem in aggregating data across different experimental and observational data sources is a lack of software infrastructure that enables flexible and extensible standardization of data and metadata. To address this challenge, we developed HDMF, a hierarchical data modeling framework for modern science data standards. With HDMF, we separate the process of data standardization into three main components: (1) data modeling and specification, (2) data I/O and storage, and (3) data interaction and data APIs. To enable standards to support the complex requirements and varying use cases throughout the data life cycle, HDMF provides object mapping infrastructure to insulate and integrate these various components. This approach supports the flexible development of data standards and extensions, optimized storage backends, and data APIs, while allowing the other components of the data standards ecosystem to remain stable. To meet the demands of modern, large-scale science data, HDMF provides advanced data I/O functionality for iterative data write, lazy data load, and parallel I/O. It also supports optimization of data storage via support for chunking, compression, linking, and modular data storage. We demonstrate the application of HDMF in practice to design NWB 2.0, a modern data standard for collaborative science across the neurophysiology community.
more » « less
Full Text Available
Plant single-cell solutions for energy and the environment

https://doi.org/10.1038/s42003-021-02477-4

Cole, Benjamin; Bergmann, Dominique; Blaby-Haas, Crysten E.; Blaby, Ian K.; Bouchard, Kristofer E.; Brady, Siobhan M.; Ciobanu, Doina; Coleman-Derr, Devin; Leiboff, Samuel; Mortimer, Jenny C.; et al (December 2021, Communications Biology)

Abstract Progress in sequencing, microfluidics, and analysis strategies has revolutionized the granularity at which multicellular organisms can be studied. In particular, single-cell transcriptomics has led to fundamental new insights into animal biology, such as the discovery of new cell types and cell type-specific disease processes. However, the application of single-cell approaches to plants, fungi, algae, or bacteria (environmental organisms) has been far more limited, largely due to the challenges posed by polysaccharide walls surrounding these species’ cells. In this perspective, we discuss opportunities afforded by single-cell technologies for energy and environmental science and grand challenges that must be tackled to apply these approaches to plants, fungi and algae. We highlight the need to develop better and more comprehensive single-cell technologies, analysis and visualization tools, and tissue preparation methods. We advocate for the creation of a centralized, open-access database to house plant single-cell data. Finally, we consider how such efforts should balance the need for deep characterization of select model species while still capturing the diversity in the plant kingdom. Investments into the development of methods, their application to relevant species, and the creation of resources to support data dissemination will enable groundbreaking insights to propel energy and environmental science forward.
more » « less
Full Text Available
iEEG-BIDS, extending the Brain Imaging Data Structure specification to human intracranial electrophysiology

https://doi.org/10.1038/s41597-019-0105-7

Holdgraf, Christopher; Appelhoff, Stefan; Bickel, Stephan; Bouchard, Kristofer; D’Ambrosio, Sasha; David, Olivier; Devinsky, Orrin; Dichter, Benjamin; Flinker, Adeen; Foster, Brett L.; et al (December 2019, Scientific Data)

Full Text Available

Search for: All records